When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, 2-consistency and Neuroscience Applications (Supplementary Material)
نویسندگان
چکیده
Remarks on transformations in pre-processing step: For all i ∈ {1, ..., k}, after applying the transformation (shift correction), we pool (Xi, yi) together to estimate β∗. Note that in general the transformation (shift correction) should not depend on the responses yi, otherwise we get a dependence on the noise. To see this, notice that yi = Xiβi + i where Xi is the transformed set of features. But when the transformation depends on yi, then Xi will also depend on i, which causes a poor estimation of β∗ (and βi). In situations where the transformations must involve yi, a sensible strategy is to separate each site’s dataset into two parts, where one part from each site is used to learn the transformation, and the other part (after applying the learned transformation) is used for pooling towards β∗ estimation and conducting our hypothesis test.
منابع مشابه
When can Multi-Site Datasets be Pooled for Regression? Hypothesis Tests, 2-consistency and Neuroscience Applications
Many studies in biomedical and health sciences involve small sample sizes due to logistic or financial constraints. Often, identifying weak (but scientifically interesting) associations between a set of predictors and a response necessitates pooling datasets from multiple diverse labs or groups. While there is a rich literature in statistical machine learning to address distributional shifts an...
متن کاملSupplementary material of NSSRF: global network similarity search with subgraph signatures and its applications
This supplementary material illustrates the related descriptions, definitions, datasets, and experiment results of NSSRF. In addition, the limitation of NSSRF in terms of network size using five PPI networks to query the random forest regression (RFR) model is discussed. Moreover, the practical memory usage of NSSRF on the four real world datasets are illustrated in this supplementary manuscript.
متن کاملSimultaneous robust estimation of multi-response surfaces in the presence of outliers
A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...
متن کاملNonparametric Learning in High Dimensions
This thesis develops flexible and principled nonparametric learning algorithms to explore, understand, and predict high dimensional and complex datasets. Such data appear frequently in modern scientific domains and lead to numerous important applications. For example, exploring high dimensional functional magnetic resonance imaging data helps us to better understand brain functionalities; infer...
متن کاملIntegrating Local and Global Error Statistics for Multi-Scale RBF Network Training: An Assessment on Remote Sensing Data
BACKGROUND This study discusses the theoretical underpinnings of a novel multi-scale radial basis function (MSRBF) neural network along with its application to classification and regression tasks in remote sensing. The novelty of the proposed MSRBF network relies on the integration of both local and global error statistics in the node selection process. METHODOLOGY AND PRINCIPAL FINDINGS The ...
متن کامل